Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source-zendesk-support: fix Groups and Organizations streams #2007

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

Alex-Bair
Copy link
Contributor

@Alex-Bair Alex-Bair commented Oct 1, 2024

Description:

The Groups and Organizations streams were missing data. This PR addresses the issues causing these streams to miss data.

Groups

Deleted groups were not being retrieved previously. By including the exclude_deleted query param and setting it to false, the Zendesk Support API is now returning deleted groups.

Organizations

Only 10k organizations were being retrieved previously. The Organizations stream was using page numbers to paginate through results. This is an older, but still supported, pagination method for Zendesk Support, and they only allow you to paginate through 10k results even if there are more. Using the page number based pagination method was causing us to only retrieve 10k documents and miss any remaining ones.

I've updated the Organizations stream to use Zendesk's time-based incremental export endpoint. This endpoint paginates by providing a start time, and all resources created/updated at or after that start time are returned. This lets us retrieve all resources instead of just a subset. This pagination strategy does have the potential downside of getting stuck in a loop if 1000+ (the max number of results per page) resources are updated at the exact same time, but there aren't currently better ways to retrieve incremental updates for Organizations.

Capture snapshot changes are expected. The new time-based incremental export endpoint adds a deleted_at property to each organization resource.

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

Tested on a local stack with a Zendesk account that had deleted groups and more than 10k organizations. Confirmed:

  • deleted groups are now retrieved
  • more than 10k organizations are retrieved
  • updated tests pass

All Organizations bindings for existing tasks need to be backfilled after merging to clear out previous state.


This change is Reviewable

@Alex-Bair Alex-Bair added the change:unplanned Unplanned change, useful for things like doc updates label Oct 1, 2024
"""
Converts a date-time formatted string to a millisecond UNIX timestamp.
"""
return int(datetime.fromisoformat(dt_str).timestamp())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually going to return seconds - the float result has the integer part as seconds, and the fractional part as microseconds. We have similar helpers elsewhere that account for this to get millisecond epoch times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, thanks for catching that. I was wrong in that function description. Zendesk's timestamps are in seconds, not milliseconds. I'll update the function description to say "...to a UNIX timestamp in seconds".

Also, the reason I stored the state as a date-time formatted string instead of an actual datetime objects was because when I retrieved the state after restarting the task, the retrieved state wasn't a datetime object but was the ISO formatted string. I'll update the helper function names to be clearer that they're accepting/returning a string and not a datetime object.

"""
API docs: https://developer.zendesk.com/api-reference/ticketing/ticket-management/incremental_exports/#incremental-organization-export

Note: This stream theoretically can get stuck in a loop if 1000+ organizations are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but per the API docs it kind of implies that this won't actually be possible, since it sounds like it'll keep returning results until one of them has a different timestamp?

... The 1000-item limit may be exceeded if items share the same timestamp ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the API docs do make it seem like it wouldn't be an issue. But we have seen other streams that used this time-based incremental export method get stuck in a loop if 1000+ resources are updated at the same time (ex: we had to move the Tickets stream to a different pagination method because it was stuck requesting the same timestamp - PR was #1789).

I don't remember the exact number of tickets that were updated at the same time in that instance, but I think it was multiple thousands. I suspect there's a limit to how much higher that 1000-item limit can go. So if there are more resources updated in the same second than 1000 + whatever the flex amount is, the stream will get stuck.

@Alex-Bair Alex-Bair force-pushed the bair/zendesk-support-fix-streams branch from 58ce1d6 to 221b14a Compare October 2, 2024 15:30
…l time-based export

The Organizations stream was previously using page numbers to retrieve
results from Zendesk. This is an older pagination method for Zendesk
Support, and we can only retrieve 10,000 documents using this endpoint.
After 10k documents, Zendesk cuts us off. This meant that we weren't
retrieving all organizations for users with more than 10k organizations.

We can switch the Organizations stream to use "time-based" incremental
exports. Instead of a page number, we use a specific start time to
request all organizations that have been updated at or after that start
time. Using this pagination strategy does have the potential downside of
getting stuck in a loop if 1000+ organizations are updated at the exact
same time, but there are not currently better ways to retrieve
incremental updates for organizations.

The new `SourceZendeskSupportIncrementalTimeExportStream` is an
improvement on the existing `SourceZendeskSupportIncrementalExportStream`
in almost every way. At a later date, we could consider refactoring out
`SourceZendeskSupportIncrementalExportStream` since the new class should
serve the same purpose even better. Doing so would require a lot of time
intensive refactoring & inheritance un-tangling that are so I've
left that for future work.
Copy link
Member

@williamhbaker williamhbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Alex-Bair Alex-Bair force-pushed the bair/zendesk-support-fix-streams branch from 221b14a to 387feb6 Compare October 2, 2024 15:35
@Alex-Bair Alex-Bair merged commit b84f7f0 into main Oct 2, 2024
66 of 75 checks passed
@Alex-Bair Alex-Bair deleted the bair/zendesk-support-fix-streams branch October 2, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change:unplanned Unplanned change, useful for things like doc updates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants